Skip to content

Implement LMDB-based multi-modal cache#30373

Open
petersalas wants to merge 1 commit intovllm-project:mainfrom
fixie-ai:psalas/lmdb-mm-cache
Open

Implement LMDB-based multi-modal cache#30373
petersalas wants to merge 1 commit intovllm-project:mainfrom
fixie-ai:psalas/lmdb-mm-cache

Conversation

@petersalas
Copy link
Copy Markdown
Contributor

@petersalas petersalas commented Dec 10, 2025

Purpose

This implements an LMDB-based multi-modal item cache which supports LRU-based eviction, multiple API server workers, and/or multiple Engine processes.

  • BaseMultiModalProcessorCache/BaseMultiModalReceiverCache now include a begin() method which (for the LMDB implementation) uses a transaction. In the sender-side cache, writes are queued outside of the transaction scope so that processing/serialization all occur outside of the scope of the write transaction (LMDB has single writer semantics).
  • Objects are split into to ~4KB chunks to avoid pathological LMDB free list fragmentation (at the expense of cache locality and copy overhead).
  • A single engine process effectively owns the cache: it starts an evictor process and handles resets. The evictor starts operating at 50% utilization and ramps up its aggressiveness (as measured by % of time holding the write lock) as the utilization approaches 100%.

One caveat with this implementation is that any item that hasn't been used within a fixed time window (--mm-lmdb-cache-min-eviction-age, defaults to 600 seconds) may be evicted, even if there are queued requests still depending on those items (the worker's execute_model will raise in that case). A future improvement could be to instead track the oldest active request in each of the frontends (OutputProcessor seemingly already has this) and use that instead.

Test Plan

Tested across permutations of API/Tensor/Data parallelism.

(Happy to run any suggested benchmarks as well!)

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new LMDB-based multi-modal cache, a significant feature enabling caching across multiple API server workers or engine processes on the same machine. The implementation is well-designed, featuring a dedicated evictor process with an adaptive strategy, object chunking to prevent fragmentation, and transaction management to ensure data consistency while minimizing lock contention. The changes are well-integrated into the existing caching framework, and the new functionality is accompanied by a solid set of tests. I've identified one minor issue regarding unreachable code, which is detailed in a specific comment. Overall, this is a high-quality and well-executed contribution.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread vllm/multimodal/lmdb_cache.py
@ywang96 ywang96 self-assigned this Dec 10, 2025
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Dec 10, 2025

Hi @petersalas, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Copy link
Copy Markdown
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some very initial comments

Comment thread vllm/multimodal/lmdb_cache.py Outdated
Comment thread vllm/multimodal/lmdb_cache.py
Comment thread vllm/multimodal/lmdb_cache.py Outdated
Comment thread vllm/multimodal/lmdb_cache.py Outdated
Comment thread vllm/multimodal/lmdb_cache.py Outdated
Comment thread vllm/v1/serial_utils.py
Comment thread vllm/multimodal/lmdb_cache.py Outdated
Comment thread vllm/multimodal/lmdb_cache.py Outdated
Comment thread vllm/multimodal/processing.py Outdated
Comment thread vllm/envs.py Outdated
Comment thread vllm/multimodal/lmdb_cache.py Outdated
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Dec 11, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @petersalas.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot removed the needs-rebase label Dec 11, 2025
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Dec 15, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @petersalas.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Dec 17, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @petersalas.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@DarkLight1337
Copy link
Copy Markdown
Member

DarkLight1337 commented Jan 15, 2026

Heads-up that I have implemented the refactor to the cache factories in #32382, and have added you as co-author.

@mergify mergify Bot removed the needs-rebase label Jan 23, 2026
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Comment @cursor review or bugbot run to trigger another review on this PR

Comment thread vllm/envs.py Outdated
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Jan 24, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @petersalas.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Jan 24, 2026
Comment thread vllm/multimodal/lmdb_cache.py Outdated
Comment thread tests/multimodal/test_cache.py Outdated
@petersalas petersalas force-pushed the psalas/lmdb-mm-cache branch from 6661fa8 to 3a190ad Compare January 31, 2026 05:05
@mergify mergify Bot removed the needs-rebase label Jan 31, 2026
Comment thread vllm/multimodal/cache.py Outdated
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Mar 11, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @petersalas.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Mar 11, 2026
@petersalas petersalas force-pushed the psalas/lmdb-mm-cache branch from 3a190ad to 5943789 Compare April 3, 2026 17:15
@petersalas petersalas requested a review from njhill as a code owner April 3, 2026 17:15
Signed-off-by: Peter Salas <peter@fixie.ai>
@petersalas petersalas force-pushed the psalas/lmdb-mm-cache branch from 5943789 to b4d18a7 Compare April 3, 2026 18:16
@mergify mergify Bot removed the needs-rebase label Apr 3, 2026
@petersalas
Copy link
Copy Markdown
Contributor Author

As one might reasonably expect, is marginally slower than the shm cache in a microbenchmark, but in exchange supports api-server scale-out which can significantly improve tail latency in multi-modal heavy inference scenarios.

> vllm bench mm-processor --model Qwen/Qwen2.5-VL-3B-Instruct --dataset-name random-mm --num-prompts 200 --tensor-parallel-size 1 -
-mm-processor-cache-type shm --mm-processor-cache-gb 2 --max-model-len 8192

================================================================================
Multimodal Processor Benchmark Results
================================================================================

MM Processor Metrics:
                     Stage  Mean Median   Std P99.0
          get_mm_hashes_ms  0.15   0.09  0.07  0.25
get_cache_missing_items_ms  0.01   0.01  0.00  0.02
     apply_hf_processor_ms 20.88   3.50 19.03 47.35
        merge_mm_kwargs_ms  5.16   1.05  4.62 10.70
   apply_prompt_updates_ms  3.04   1.04  2.22  5.77
     preprocessor_total_ms 29.24   5.62 25.93 63.14
        encoder_forward_ms 34.67  35.86 16.56 70.56
         num_encoder_calls  1.00   1.00  0.00  1.00

Summary: 200 total encoder calls across 200 requests.

End-to-End Latency (ms):
Metric Value (ms)
  Mean   18606.72
Median   17734.48
   Std    4505.58
 P99.0   29678.76

 > vllm bench mm-processor --model Qwen/Qwen2.5-VL-3B-Instruct --dataset-name random-mm --num-prompts 200 --tensor-parallel-size 1 -
-mm-processor-cache-type lmdb --mm-processor-cache-gb 2 --max-model-len 8192

================================================================================
Multimodal Processor Benchmark Results
================================================================================

MM Processor Metrics:
                     Stage  Mean Median   Std P99.0
          get_mm_hashes_ms  0.16   0.11  0.08  0.32
get_cache_missing_items_ms  0.02   0.01  0.00  0.03
     apply_hf_processor_ms 22.09   4.18 20.42 61.18
        merge_mm_kwargs_ms  1.80   0.36  1.65  4.72
   apply_prompt_updates_ms  3.18   1.30  2.31  6.89
     preprocessor_total_ms 27.24   5.66 24.39 70.14
        encoder_forward_ms 34.70  29.71 15.50 70.51
         num_encoder_calls  1.00   1.00  0.00  1.00

Summary: 200 total encoder calls across 200 requests.

End-to-End Latency (ms):
Metric Value (ms)
  Mean   18451.56
Median   16108.19
   Std    5463.13
 P99.0   31827.07

@petersalas petersalas requested a review from DarkLight1337 April 3, 2026 21:56
Copy link
Copy Markdown
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asked @ywang96 to take a look since I did a pass before already

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build multi-modality Related to multi-modality (#4194) v1

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

4 participants